<!DOCTYPE html>
<html lang="en">
<head>
	<meta charset="UTF-8">
	<meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
	<title>Update by query API | ElasticSearch 7.7 权威指南中文版</title>
	<meta name="keywords" content="ElasticSearch 权威指南中文版, elasticsearch 7, es7, 实时数据分析，实时数据检索" />
    <meta name="description" content="ElasticSearch 权威指南中文版, elasticsearch 7, es7, 实时数据分析，实时数据检索" />
    <!-- Give IE8 a fighting chance -->
    <!--[if lt IE 9]>
    <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
    <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
	<link rel="stylesheet" type="text/css" href="../static/styles.css" />
	<script>
	var _link = 'docs-update-by-query.html';
    </script>
</head>
<body>
<div class="main-container">
    <section id="content">
        <div class="content-wrapper">
            <section id="guide" lang="zh_cn">
                <div class="container">
                    <div class="row">
                        <div class="col-xs-12 col-sm-8 col-md-8 guide-section">
                            <div style="color:gray; word-break: break-all; font-size:12px;">原英文版地址: <a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.7/docs-update-by-query.html" rel="nofollow" target="_blank">https://www.elastic.co/guide/en/elasticsearch/reference/7.7/docs-update-by-query.html</a>, 原文档版权归 www.elastic.co 所有<br/>本地英文版地址: <a href="../en/docs-update-by-query.html" rel="nofollow" target="_blank">../en/docs-update-by-query.html</a></div>
                        <!-- start body -->
                  <div class="page_header">
<strong>重要</strong>: 此版本不会发布额外的bug修复或文档更新。最新信息请参考 <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html" rel="nofollow">当前版本文档</a>。
</div>
<div id="content">
<div class="breadcrumbs">
<span class="breadcrumb-link"><a href="index.html">Elasticsearch Guide [7.7]</a></span>
»
<span class="breadcrumb-link"><a href="rest-apis.html">REST APIs</a></span>
»
<span class="breadcrumb-link"><a href="docs.html">Document APIs</a></span>
»
<span class="breadcrumb-node">Update by query API</span>
</div>
<div class="navheader">
<span class="prev">
<a href="docs-update.html">« Update API</a>
</span>
<span class="next">
<a href="docs-multi-get.html">Multi get (mget) API »</a>
</span>
</div>
<div class="section">
<div class="titlepage"><div><div>
<h2 class="title">
<a id="docs-update-by-query"></a>Update by query API<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/docs/update-by-query.asciidoc">edit</a>
</h2>
</div></div></div>
<p>The simplest usage of <code class="literal">_update_by_query</code> just performs an update on every
document in the index without changing the source. This is useful to
<a class="xref" href="docs-update-by-query.html#picking-up-a-new-property" title="Pick up a new property">pick up a new property</a> or some other online
mapping change. Here is the API:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">POST twitter/_update_by_query?conflicts=proceed</pre>
</div>
<div class="console_widget" data-snippet="snippets/1503.console"></div>
<p>That will return something like this:</p>
<div class="pre_wrapper lang-console-result">
<pre class="programlisting prettyprint lang-console-result">{
  "took" : 147,
  "timed_out": false,
  "updated": 120,
  "deleted": 0,
  "batches": 1,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1.0,
  "throttled_until_millis": 0,
  "total": 120,
  "failures" : [ ]
}</pre>
</div>
<p><code class="literal">_update_by_query</code> gets a snapshot of the index when it starts and indexes what
it finds using <code class="literal">internal</code> versioning. That means you’ll get a version
conflict if the document changes between the time when the snapshot was taken
and when the index request is processed. When the versions match, the document
is updated and the version number is incremented.</p>
<div class="note admon">
<div class="icon"></div>
<div class="admon_content">
<p>Since <code class="literal">internal</code> versioning does not support the value 0 as a valid
version number, documents with version equal to zero cannot be updated using
<code class="literal">_update_by_query</code> and will fail the request.</p>
</div>
</div>
<p>All update and query failures cause the <code class="literal">_update_by_query</code> to abort and are
returned in the <code class="literal">failures</code> of the response. The updates that have been
performed still stick. In other words, the process is not rolled back, only
aborted. While the first failure causes the abort, all failures that are
returned by the failing bulk request are returned in the <code class="literal">failures</code> element; therefore
it’s possible for there to be quite a few failed entities.</p>
<p>If you want to simply count version conflicts, and not cause the <code class="literal">_update_by_query</code>
to abort, you can set <code class="literal">conflicts=proceed</code> on the url or <code class="literal">"conflicts": "proceed"</code>
in the request body. The first example does this because it is just trying to
pick up an online mapping change, and a version conflict simply means that the
conflicting document was updated between the start of the <code class="literal">_update_by_query</code>
and the time when it attempted to update the document. This is fine because
that update will have picked up the online mapping update.</p>
<p>Back to the API format, this will update tweets from the <code class="literal">twitter</code> index:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">POST twitter/_update_by_query?conflicts=proceed</pre>
</div>
<div class="console_widget" data-snippet="snippets/1504.console"></div>
<p>You can also limit <code class="literal">_update_by_query</code> using the
<a class="xref" href="query-dsl.html" title="Query DSL">Query DSL</a>. This will update all documents from the
<code class="literal">twitter</code> index for the user <code class="literal">kimchy</code>:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">POST twitter/_update_by_query?conflicts=proceed
{
  "query": { <a id="CO555-1"></a><i class="conum" data-value="1"></i>
    "term": {
      "user": "kimchy"
    }
  }
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/1505.console"></div>
<div class="calloutlist">
<table border="0" summary="Callout list">
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO555-1"><i class="conum" data-value="1"></i></a></p>
</td>
<td align="left" valign="top">
<p>The query must be passed as a value to the <code class="literal">query</code> key, in the same
way as the <a class="xref" href="search-search.html" title="Search API">Search API</a>. You can also use the <code class="literal">q</code>
parameter in the same way as the search API.</p>
</td>
</tr>
</table>
</div>
<p>So far we’ve only been updating documents without changing their source. That
is genuinely useful for things like
<a class="xref" href="docs-update-by-query.html#picking-up-a-new-property" title="Pick up a new property">picking up new properties</a> but it’s only half the
fun. <code class="literal">_update_by_query</code> <a class="xref" href="modules-scripting-using.html" title="How to use scripts">supports scripts</a> to update
the document. This will increment the <code class="literal">likes</code> field on all of kimchy’s tweets:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">POST twitter/_update_by_query
{
  "script": {
    "source": "ctx._source.likes++",
    "lang": "painless"
  },
  "query": {
    "term": {
      "user": "kimchy"
    }
  }
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/1506.console"></div>
<p>Just as in <a class="xref" href="docs-update.html" title="Update API">Update API</a> you can set <code class="literal">ctx.op</code> to change the
operation that is executed:</p>
<div class="informaltable">
<table border="0" cellpadding="4px">
<colgroup>
<col>
<col>
</colgroup>
<tbody valign="top">
<tr>
<td valign="top">
<p>
<code class="literal">noop</code>
</p>
</td>
<td valign="top">
<p>
Set <code class="literal">ctx.op = "noop"</code> if your script decides that it doesn’t have to make any
changes. That will cause <code class="literal">_update_by_query</code> to omit that document from its updates.
 This no operation will be reported in the <code class="literal">noop</code> counter in the
<a class="xref" href="docs-update-by-query.html#docs-update-by-query-response-body" title="Response body">response body</a>.
</p>
</td>
</tr>
<tr>
<td valign="top">
<p>
<code class="literal">delete</code>
</p>
</td>
<td valign="top">
<p>
Set <code class="literal">ctx.op = "delete"</code> if your script decides that the document must be
 deleted. The deletion will be reported in the <code class="literal">deleted</code> counter in the
<a class="xref" href="docs-update-by-query.html#docs-update-by-query-response-body" title="Response body">response body</a>.
</p>
</td>
</tr>
</tbody>
</table>
</div>
<p>Setting <code class="literal">ctx.op</code> to anything else is an error. Setting any
other field in <code class="literal">ctx</code> is an error.</p>
<p>Note that we stopped specifying <code class="literal">conflicts=proceed</code>. In this case we want a
version conflict to abort the process so we can handle the failure.</p>
<p>This API doesn’t allow you to move the documents it touches, just modify their
source. This is intentional! We’ve made no provisions for removing the document
from its original location.</p>
<p>It’s also possible to do this whole thing on multiple indexes at once, just
like the search API:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">POST twitter,blog/_update_by_query</pre>
</div>
<div class="console_widget" data-snippet="snippets/1507.console"></div>
<p>If you provide <code class="literal">routing</code> then the routing is copied to the scroll query,
limiting the process to the shards that match that routing value:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">POST twitter/_update_by_query?routing=1</pre>
</div>
<div class="console_widget" data-snippet="snippets/1508.console"></div>
<p>By default <code class="literal">_update_by_query</code> uses scroll batches of 1000. You can change the
batch size with the <code class="literal">scroll_size</code> URL parameter:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">POST twitter/_update_by_query?scroll_size=100</pre>
</div>
<div class="console_widget" data-snippet="snippets/1509.console"></div>
<p><code class="literal">_update_by_query</code> can also use the <a class="xref" href="ingest.html" title="Ingest node">Ingest node</a> feature by
specifying a <code class="literal">pipeline</code> like this:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">PUT _ingest/pipeline/set-foo
{
  "description" : "sets foo",
  "processors" : [ {
      "set" : {
        "field": "foo",
        "value": "bar"
      }
  } ]
}
POST twitter/_update_by_query?pipeline=set-foo</pre>
</div>
<div class="console_widget" data-snippet="snippets/1510.console"></div>
<h4>
<a id="_url_parameters"></a>URL parameters<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/docs/update-by-query.asciidoc">edit</a>
</h4>
<p>In addition to the standard parameters like <code class="literal">pretty</code>, the Update By Query API
also supports <code class="literal">refresh</code>, <code class="literal">wait_for_completion</code>, <code class="literal">wait_for_active_shards</code>, <code class="literal">timeout</code>,
and <code class="literal">scroll</code>.</p>
<p>Sending the <code class="literal">refresh</code> will update all shards in the index being updated when
the request completes. This is different than the Update API’s <code class="literal">refresh</code>
parameter, which causes just the shard that received the new data to be indexed.
Also unlike the Update API it does not support <code class="literal">wait_for</code>.</p>
<p>If the request contains <code class="literal">wait_for_completion=false</code> then Elasticsearch will
perform some preflight checks, launch the request, and then return a <code class="literal">task</code>
which can be used with <a class="xref" href="docs-update-by-query.html#docs-update-by-query-task-api" title="Works with the Task API">Tasks APIs</a>
to cancel or get the status of the task. Elasticsearch will also create a
record of this task as a document at <code class="literal">.tasks/task/${taskId}</code>. This is yours
to keep or remove as you see fit. When you are done with it, delete it so
Elasticsearch can reclaim the space it uses.</p>
<p><code class="literal">wait_for_active_shards</code> controls how many copies of a shard must be active
before proceeding with the request. See <a class="xref" href="docs-index_.html#index-wait-for-active-shards" title="Active shards">here</a>
for details. <code class="literal">timeout</code> controls how long each write request waits for unavailable
shards to become available. Both work exactly how they work in the
<a class="xref" href="docs-bulk.html" title="Bulk API">Bulk API</a>. Because <code class="literal">_update_by_query</code> uses scroll search, you can also specify
the <code class="literal">scroll</code> parameter to control how long it keeps the "search context" alive,
e.g. <code class="literal">?scroll=10m</code>. By default it’s 5 minutes.</p>
<p><code class="literal">requests_per_second</code> can be set to any positive decimal number (<code class="literal">1.4</code>, <code class="literal">6</code>,
<code class="literal">1000</code>, etc.) and throttles the rate at which <code class="literal">_update_by_query</code> issues batches of
index operations by padding each batch with a wait time. The throttling can be
disabled by setting <code class="literal">requests_per_second</code> to <code class="literal">-1</code>.</p>
<p>The throttling is done by waiting between batches so that scroll that
<code class="literal">_update_by_query</code> uses internally can be given a timeout that takes into
account the padding. The padding time is the difference between the batch size
divided by the <code class="literal">requests_per_second</code> and the time spent writing. By default the
batch size is <code class="literal">1000</code>, so if the <code class="literal">requests_per_second</code> is set to <code class="literal">500</code>:</p>
<div class="pre_wrapper lang-txt">
<pre class="programlisting prettyprint lang-txt">target_time = 1000 / 500 per second = 2 seconds
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds</pre>
</div>
<p>Since the batch is issued as a single <code class="literal">_bulk</code> request, large batch sizes will
cause Elasticsearch to create many requests and then wait for a while before
starting the next set. This is "bursty" instead of "smooth". The default is <code class="literal">-1</code>.</p>
<h4>
<a id="docs-update-by-query-response-body"></a>Response body<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/docs/update-by-query.asciidoc">edit</a>
</h4>
<p>The JSON response looks like this:</p>
<div class="pre_wrapper lang-console-result">
<pre class="programlisting prettyprint lang-console-result">{
  "took" : 147,
  "timed_out": false,
  "total": 5,
  "updated": 5,
  "deleted": 0,
  "batches": 1,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1.0,
  "throttled_until_millis": 0,
  "failures" : [ ]
}</pre>
</div>
<div class="informaltable">
<table border="0" cellpadding="4px">
<colgroup>
<col>
<col>
</colgroup>
<tbody valign="top">
<tr>
<td valign="top">
<p>
<code class="literal">took</code>
</p>
</td>
<td valign="top">
<p>
The number of milliseconds from start to end of the whole operation.
</p>
</td>
</tr>
<tr>
<td valign="top">
<p>
<code class="literal">timed_out</code>
</p>
</td>
<td valign="top">
<p>
This flag is set to <code class="literal">true</code> if any of the requests executed during the
update by query execution has timed out.
</p>
</td>
</tr>
<tr>
<td valign="top">
<p>
<code class="literal">total</code>
</p>
</td>
<td valign="top">
<p>
The number of documents that were successfully processed.
</p>
</td>
</tr>
<tr>
<td valign="top">
<p>
<code class="literal">updated</code>
</p>
</td>
<td valign="top">
<p>
The number of documents that were successfully updated.
</p>
</td>
</tr>
<tr>
<td valign="top">
<p>
<code class="literal">deleted</code>
</p>
</td>
<td valign="top">
<p>
The number of documents that were successfully deleted.
</p>
</td>
</tr>
<tr>
<td valign="top">
<p>
<code class="literal">batches</code>
</p>
</td>
<td valign="top">
<p>
The number of scroll responses pulled back by the update by query.
</p>
</td>
</tr>
<tr>
<td valign="top">
<p>
<code class="literal">version_conflicts</code>
</p>
</td>
<td valign="top">
<p>
The number of version conflicts that the update by query hit.
</p>
</td>
</tr>
<tr>
<td valign="top">
<p>
<code class="literal">noops</code>
</p>
</td>
<td valign="top">
<p>
The number of documents that were ignored because the script used for
the update by query returned a <code class="literal">noop</code> value for <code class="literal">ctx.op</code>.
</p>
</td>
</tr>
<tr>
<td valign="top">
<p>
<code class="literal">retries</code>
</p>
</td>
<td valign="top">
<p>
The number of retries attempted by update by query. <code class="literal">bulk</code> is the number of bulk
actions retried, and <code class="literal">search</code> is the number of search actions retried.
</p>
</td>
</tr>
<tr>
<td valign="top">
<p>
<code class="literal">throttled_millis</code>
</p>
</td>
<td valign="top">
<p>
Number of milliseconds the request slept to conform to <code class="literal">requests_per_second</code>.
</p>
</td>
</tr>
<tr>
<td valign="top">
<p>
<code class="literal">requests_per_second</code>
</p>
</td>
<td valign="top">
<p>
The number of requests per second effectively executed during the update by query.
</p>
</td>
</tr>
<tr>
<td valign="top">
<p>
<code class="literal">throttled_until_millis</code>
</p>
</td>
<td valign="top">
<p>
This field should always be equal to zero in an <code class="literal">_update_by_query</code> response. It only
has meaning when using the <a class="xref" href="docs-update-by-query.html#docs-update-by-query-task-api" title="Works with the Task API">Task API</a>, where it
indicates the next time (in milliseconds since epoch) a throttled request will be
executed again in order to conform to <code class="literal">requests_per_second</code>.
</p>
</td>
</tr>
<tr>
<td valign="top">
<p>
<code class="literal">failures</code>
</p>
</td>
<td valign="top">
<p>
Array of failures if there were any unrecoverable errors during the process. If
this is non-empty then the request aborted because of those failures.
Update by query is implemented using batches. Any failure causes the entire
process to abort, but all failures in the current batch are collected into the
array. You can use the <code class="literal">conflicts</code> option to prevent reindex from aborting on
version conflicts.
</p>
</td>
</tr>
</tbody>
</table>
</div>
<h4>
<a id="docs-update-by-query-task-api"></a>Works with the Task API<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/docs/update-by-query.asciidoc">edit</a>
</h4>
<p>You can fetch the status of all running update by query requests with the
<a class="xref" href="tasks.html" title="Task management API">Task API</a>:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">GET _tasks?detailed=true&amp;actions=*byquery</pre>
</div>
<div class="console_widget" data-snippet="snippets/1511.console"></div>
<p>The responses looks like:</p>
<div class="pre_wrapper lang-console-result">
<pre class="programlisting prettyprint lang-console-result">{
  "nodes" : {
    "r1A2WoRbTwKZ516z6NEs5A" : {
      "name" : "r1A2WoR",
      "transport_address" : "127.0.0.1:9300",
      "host" : "127.0.0.1",
      "ip" : "127.0.0.1:9300",
      "attributes" : {
        "testattr" : "test",
        "portsfile" : "true"
      },
      "tasks" : {
        "r1A2WoRbTwKZ516z6NEs5A:36619" : {
          "node" : "r1A2WoRbTwKZ516z6NEs5A",
          "id" : 36619,
          "type" : "transport",
          "action" : "indices:data/write/update/byquery",
          "status" : {    <a id="CO556-1"></a><i class="conum" data-value="1"></i>
            "total" : 6154,
            "updated" : 3500,
            "created" : 0,
            "deleted" : 0,
            "batches" : 4,
            "version_conflicts" : 0,
            "noops" : 0,
            "retries": {
              "bulk": 0,
              "search": 0
            },
            "throttled_millis": 0
          },
          "description" : ""
        }
      }
    }
  }
}</pre>
</div>
<div class="calloutlist">
<table border="0" summary="Callout list">
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO556-1"><i class="conum" data-value="1"></i></a></p>
</td>
<td align="left" valign="top">
<p>This object contains the actual status. It is just like the response JSON
with the important addition of the <code class="literal">total</code> field. <code class="literal">total</code> is the total number
of operations that the reindex expects to perform. You can estimate the
progress by adding the <code class="literal">updated</code>, <code class="literal">created</code>, and <code class="literal">deleted</code> fields. The request
will finish when their sum is equal to the <code class="literal">total</code> field.</p>
</td>
</tr>
</table>
</div>
<p>With the task id you can look up the task directly. The following example
retrieves information about task <code class="literal">r1A2WoRbTwKZ516z6NEs5A:36619</code>:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">GET /_tasks/r1A2WoRbTwKZ516z6NEs5A:36619</pre>
</div>
<div class="console_widget" data-snippet="snippets/1512.console"></div>
<p>The advantage of this API is that it integrates with <code class="literal">wait_for_completion=false</code>
to transparently return the status of completed tasks. If the task is completed
and <code class="literal">wait_for_completion=false</code> was set on it, then it’ll come back with a
<code class="literal">results</code> or an <code class="literal">error</code> field. The cost of this feature is the document that
<code class="literal">wait_for_completion=false</code> creates at <code class="literal">.tasks/task/${taskId}</code>. It is up to
you to delete that document.</p>
<h4>
<a id="docs-update-by-query-cancel-task-api"></a>Works with the Cancel Task API<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/docs/update-by-query.asciidoc">edit</a>
</h4>
<p>Any update by query can be cancelled using the <a class="xref" href="tasks.html" title="Task management API">Task Cancel API</a>:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">POST _tasks/r1A2WoRbTwKZ516z6NEs5A:36619/_cancel</pre>
</div>
<div class="console_widget" data-snippet="snippets/1513.console"></div>
<p>The task ID can be found using the <a class="xref" href="tasks.html" title="Task management API">tasks API</a>.</p>
<p>Cancellation should happen quickly but might take a few seconds. The task status
API above will continue to list the update by query task until this task checks
that it has been cancelled and terminates itself.</p>
<h4>
<a id="docs-update-by-query-rethrottle"></a>Rethrottling<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/docs/update-by-query.asciidoc">edit</a>
</h4>
<p>The value of <code class="literal">requests_per_second</code> can be changed on a running update by query
using the <code class="literal">_rethrottle</code> API:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">POST _update_by_query/r1A2WoRbTwKZ516z6NEs5A:36619/_rethrottle?requests_per_second=-1</pre>
</div>
<div class="console_widget" data-snippet="snippets/1514.console"></div>
<p>The task ID can be found using the <a class="xref" href="tasks.html" title="Task management API">tasks API</a>.</p>
<p>Just like when setting it on the <code class="literal">_update_by_query</code> API, <code class="literal">requests_per_second</code>
can be either <code class="literal">-1</code> to disable throttling or any decimal number
like <code class="literal">1.7</code> or <code class="literal">12</code> to throttle to that level. Rethrottling that speeds up the
query takes effect immediately, but rethrotting that slows down the query will
take effect after completing the current batch. This prevents scroll
timeouts.</p>
<h4>
<a id="docs-update-by-query-slice"></a>Slicing<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/docs/update-by-query.asciidoc">edit</a>
</h4>
<p>Update by query supports <a class="xref" href="search-request-body.html#sliced-scroll" title="Sliced Scroll">Sliced Scroll</a> to parallelize the updating process.
This parallelization can improve efficiency and provide a convenient way to
break the request down into smaller parts.</p>
<h5>
<a id="docs-update-by-query-manual-slice"></a>Manual slicing<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/docs/update-by-query.asciidoc">edit</a>
</h5>
<p>Slice an update by query manually by providing a slice id and total number of
slices to each request:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">POST twitter/_update_by_query
{
  "slice": {
    "id": 0,
    "max": 2
  },
  "script": {
    "source": "ctx._source['extra'] = 'test'"
  }
}
POST twitter/_update_by_query
{
  "slice": {
    "id": 1,
    "max": 2
  },
  "script": {
    "source": "ctx._source['extra'] = 'test'"
  }
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/1515.console"></div>
<p>Which you can verify works with:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">GET _refresh
POST twitter/_search?size=0&amp;q=extra:test&amp;filter_path=hits.total</pre>
</div>
<div class="console_widget" data-snippet="snippets/1516.console"></div>
<p>Which results in a sensible <code class="literal">total</code> like this one:</p>
<div class="pre_wrapper lang-console-result">
<pre class="programlisting prettyprint lang-console-result">{
  "hits": {
    "total": {
        "value": 120,
        "relation": "eq"
    }
  }
}</pre>
</div>
<h5>
<a id="docs-update-by-query-automatic-slice"></a>Automatic slicing<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/docs/update-by-query.asciidoc">edit</a>
</h5>
<p>You can also let update by query automatically parallelize using
<a class="xref" href="search-request-body.html#sliced-scroll" title="Sliced Scroll">Sliced Scroll</a> to slice on <code class="literal">_id</code>. Use <code class="literal">slices</code> to specify the number of
slices to use:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">POST twitter/_update_by_query?refresh&amp;slices=5
{
  "script": {
    "source": "ctx._source['extra'] = 'test'"
  }
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/1517.console"></div>
<p>Which you also can verify works with:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">POST twitter/_search?size=0&amp;q=extra:test&amp;filter_path=hits.total</pre>
</div>
<div class="console_widget" data-snippet="snippets/1518.console"></div>
<p>Which results in a sensible <code class="literal">total</code> like this one:</p>
<div class="pre_wrapper lang-console-result">
<pre class="programlisting prettyprint lang-console-result">{
  "hits": {
    "total": {
        "value": 120,
        "relation": "eq"
    }
  }
}</pre>
</div>
<p>Setting <code class="literal">slices</code> to <code class="literal">auto</code> will let Elasticsearch choose the number of slices
to use. This setting will use one slice per shard, up to a certain limit. If
there are multiple source indices, it will choose the number of slices based
on the index with the smallest number of shards.</p>
<p>Adding <code class="literal">slices</code> to <code class="literal">_update_by_query</code> just automates the manual process used in
the section above, creating sub-requests which means it has some quirks:</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
You can see these requests in the
<a class="xref" href="docs-update-by-query.html#docs-update-by-query-task-api" title="Works with the Task API">Tasks APIs</a>. These sub-requests are "child"
tasks of the task for the request with <code class="literal">slices</code>.
</li>
<li class="listitem">
Fetching the status of the task for the request with <code class="literal">slices</code> only contains
the status of completed slices.
</li>
<li class="listitem">
These sub-requests are individually addressable for things like cancellation
and rethrottling.
</li>
<li class="listitem">
Rethrottling the request with <code class="literal">slices</code> will rethrottle the unfinished
sub-request proportionally.
</li>
<li class="listitem">
Canceling the request with <code class="literal">slices</code> will cancel each sub-request.
</li>
<li class="listitem">
Due to the nature of <code class="literal">slices</code> each sub-request won’t get a perfectly even
portion of the documents. All documents will be addressed, but some slices may
be larger than others. Expect larger slices to have a more even distribution.
</li>
<li class="listitem">
Parameters like <code class="literal">requests_per_second</code> and <code class="literal">max_docs</code> on a request with
<code class="literal">slices</code> are distributed proportionally to each sub-request. Combine that with
the point above about distribution being uneven and you should conclude that
using <code class="literal">max_docs</code> with <code class="literal">slices</code> might not result in exactly <code class="literal">max_docs</code> documents
being updated.
</li>
<li class="listitem">
Each sub-request gets a slightly different snapshot of the source index
though these are all taken at approximately the same time.
</li>
</ul>
</div>
<h6>
<a id="docs-update-by-query-picking-slices"></a>Picking the number of slices<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/docs/update-by-query.asciidoc">edit</a>
</h6>
<p>If slicing automatically, setting <code class="literal">slices</code> to <code class="literal">auto</code> will choose a reasonable
number for most indices. If you’re slicing manually or otherwise tuning
automatic slicing, use these guidelines.</p>
<p>Query performance is most efficient when the number of <code class="literal">slices</code> is equal to the
number of shards in the index. If that number is large, (for example,
500) choose a lower number as too many <code class="literal">slices</code> will hurt performance. Setting
<code class="literal">slices</code> higher than the number of shards generally does not improve efficiency
and adds overhead.</p>
<p>Update performance scales linearly across available resources with the
number of slices.</p>
<p>Whether query or update performance dominates the runtime depends on the
documents being reindexed and cluster resources.</p>
<h4>
<a id="picking-up-a-new-property"></a>Pick up a new property<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/docs/update-by-query.asciidoc">edit</a>
</h4>
<p>Say you created an index without dynamic mapping, filled it with data, and then
added a mapping value to pick up more fields from the data:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">PUT test
{
  "mappings": {
    "dynamic": false,   <a id="CO557-1"></a><i class="conum" data-value="1"></i>
    "properties": {
      "text": {"type": "text"}
    }
  }
}

POST test/_doc?refresh
{
  "text": "words words",
  "flag": "bar"
}
POST test/_doc?refresh
{
  "text": "words words",
  "flag": "foo"
}
PUT test/_mapping   <a id="CO557-2"></a><i class="conum" data-value="2"></i>
{
  "properties": {
    "text": {"type": "text"},
    "flag": {"type": "text", "analyzer": "keyword"}
  }
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/1519.console"></div>
<div class="calloutlist">
<table border="0" summary="Callout list">
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO557-1"><i class="conum" data-value="1"></i></a></p>
</td>
<td align="left" valign="top">
<p>This means that new fields won’t be indexed, just stored in <code class="literal">_source</code>.</p>
</td>
</tr>
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO557-2"><i class="conum" data-value="2"></i></a></p>
</td>
<td align="left" valign="top">
<p>This updates the mapping to add the new <code class="literal">flag</code> field. To pick up the new
field you have to reindex all documents with it.</p>
</td>
</tr>
</table>
</div>
<p>Searching for the data won’t find anything:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">POST test/_search?filter_path=hits.total
{
  "query": {
    "match": {
      "flag": "foo"
    }
  }
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/1520.console"></div>
<div class="pre_wrapper lang-console-result">
<pre class="programlisting prettyprint lang-console-result">{
  "hits" : {
    "total": {
        "value": 0,
        "relation": "eq"
    }
  }
}</pre>
</div>
<p>But you can issue an <code class="literal">_update_by_query</code> request to pick up the new mapping:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">POST test/_update_by_query?refresh&amp;conflicts=proceed
POST test/_search?filter_path=hits.total
{
  "query": {
    "match": {
      "flag": "foo"
    }
  }
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/1521.console"></div>
<div class="pre_wrapper lang-console-result">
<pre class="programlisting prettyprint lang-console-result">{
  "hits" : {
    "total": {
        "value": 1,
        "relation": "eq"
    }
  }
}</pre>
</div>
<p>You can do the exact same thing when adding a field to a multifield.</p>
</div>
<div class="navfooter">
<span class="prev">
<a href="docs-update.html">« Update API</a>
</span>
<span class="next">
<a href="docs-multi-get.html">Multi get (mget) API »</a>
</span>
</div>
</div>

                  <!-- end body -->
                        </div>
                        <div class="col-xs-12 col-sm-4 col-md-4" id="right_col">
                        
                        </div>
                    </div>
                </div>
            </section>
        </div>
    </section>
</div>
<script src="../static/cn.js"></script>
</body>
</html>